In recent hyperspectral unmixing (HU) literature, the application of deep learning (DL) has become more prominent, especially with the autoencoder (AE) architecture. We propose a split architecture and use a pseudo-ground truth for abundances to guide the `unmixing network' (UN) optimization. Preceding the UN, an `approximation network' (AN) is proposed, which will improve the association between the centre pixel and its neighbourhood. Hence, it will accentuate spatial correlation in the abundances as its output is the input to the UN and the reference for the `mixing network' (MN). In the Guided Encoder-Decoder Architecture for Hyperspectral Unmixing with Spatial Smoothness (GAUSS), we proposed using one-hot encoded abundances as the pseudo-ground truth to guide the UN; computed using the k-means algorithm to exclude the use of prior HU methods. Furthermore, we release the single-layer constraint on MN by introducing the UN generated abundances in contrast to the standard AE for HU. Secondly, we experimented with two modifications on the pre-trained network using the GAUSS method. In GAUSS$_\textit{blind}$, we have concatenated the UN and the MN to back-propagate the reconstruction error gradients to the encoder. Then, in the GAUSS$_\textit{prime}$, abundance results of a signal processing (SP) method with reliable abundance results were used as the pseudo-ground truth with the GAUSS architecture. According to quantitative and graphical results for four experimental datasets, the three architectures either transcended or equated the performance of existing HU algorithms from both DL and SP domains.
translated by 谷歌翻译
Covid-19大流行导致了前所未有的全球公共卫生危机。鉴于其固有的性质,建议社会疏散措施作为遏制这种大流行传播的主要策略。因此,识别违反这些协议的情况,对削减疾病的传播并促进可持续生活方式具有影响。本文提出了一种基于电脑视觉的基于计算机视觉的系统,分析了CCTV镜头,为Covid-19传播提供了威胁水平评估。该系统努力捕获跨越多个帧的CCTV镜头的信息内容,以识别各个帧的各种违反社会偏移协议的实例,以及跨空间的识别,以及组行为的识别。该功能主要是通过利用基于时间图的基础结构来实现CCTV镜头的信息和对全能解释图的策略并量化给定场景的威胁级别的策略。在一系列场景中测试并验证各个组件,并针对人类专家意见进行了完整的系统。结果反映了威胁水平对人,其物理接近,相互作用,防护服和群体动力学的依赖。系统性能的准确性为76%,从而在城市进行了可部署的威胁监控系统,以允许社会中的正常和可持续性。
translated by 谷歌翻译
评估胎儿和母亲的健康对于预防和识别怀孕可能的并发症至关重要。本文重点介绍了母亲自己能够用最小的监督和胎儿和产妇健康有效地使用的装置,同时安全,舒适,易于使用。所提出的设备使用母亲子宫内的单个加速度计的带以记录所需信息。该设备预计将在长期长期监测母亲和胎儿,并提供具有有用信息的医疗专业人员,否则他们将由于目前进行健康监测的频率低频率而忽略。本文表明,即使在存在温和的干扰情况下,母亲和胎儿运动的呼吸信息的同时测量实际上是可能的,如果预计该设备延长延长,则需要考虑该装置。
translated by 谷歌翻译
内在图像分解是生成图像的成分的打开问题。从单个图像产生反射率和阴影是一个具体的任务,特别是当没有地面真相时。缺乏无监督的学习方法,用于使用单个图像将图像分解成反射率和阴影。我们提出了一种神经网络架构,其能够使用从图像导出的基于物理的参数进行这种分解。通过实验结果,我们展示了(a)所提出的方法优于现有的基于深度学习的IID技术和(b)衍生参数显着提高疗效。我们得出结论,对结果(数值和示例图像)的仔细分析,显示了几个途径以改进。
translated by 谷歌翻译
Differentiable rendering aims to compute the derivative of the image rendering function with respect to the rendering parameters. This paper presents a novel algorithm for 6-DoF pose estimation through gradient-based optimization using a differentiable rendering pipeline. We emphasize two key contributions: (1) instead of solving the conventional 2D to 3D correspondence problem and computing reprojection errors, images (rendered using the 3D model) are compared only in the 2D feature space via sparse 2D feature correspondences. (2) Instead of an analytical image formation model, we compute an approximate local gradient of the rendering process through online learning. The learning data consists of image features extracted from multi-viewpoint renders at small perturbations in the pose neighborhood. The gradients are propagated through the rendering pipeline for the 6-DoF pose estimation using nonlinear least squares. This gradient-based optimization regresses directly upon the pose parameters by aligning the 3D model to reproduce a reference image shape. Using representative experiments, we demonstrate the application of our approach to pose estimation in proximity operations.
translated by 谷歌翻译
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community, but have not received as much attention as lower-level tasks like speech and speaker recognition. In particular, there are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark datasets for several tasks. In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape. We contribute four tasks: question answering and summarization involve inference over longer speech sequences; named entity localization addresses the speech-specific task of locating the targeted content in the signal; dialog act classification identifies the function of a given speech utterance. We follow the blueprint of the Spoken Language Understanding Evaluation (SLUE) benchmark suite. In order to facilitate the development of SLU models that leverage the success of pre-trained speech representations, we will be publishing for each task (i) annotations for a relatively small fine-tuning set, (ii) annotated development and test sets, and (iii) baseline models for easy reproducibility and comparisons. In this work, we present the details of data collection and annotation and the performance of the baseline models. We also perform sensitivity analysis of pipeline models' performance (speech recognizer + text model) to the speech recognition accuracy, using more than 20 state-of-the-art speech recognition models.
translated by 谷歌翻译
Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community. Recent research efforts have quickly led to the design of novel algorithms able to reduce the impact of the catastrophic forgetting phenomenon in deep neural networks. Due to this surge of interest in the field, many competitions have been held in recent years, as they are an excellent opportunity to stimulate research in promising directions. This paper summarizes the ideas, design choices, rules, and results of the challenge held at the 3rd Continual Learning in Computer Vision (CLVision) Workshop at CVPR 2022. The focus of this competition is the complex continual object detection task, which is still underexplored in literature compared to classification tasks. The challenge is based on the challenge version of the novel EgoObjects dataset, a large-scale egocentric object dataset explicitly designed to benchmark continual learning algorithms for egocentric category-/instance-level object understanding, which covers more than 1k unique main objects and 250+ categories in around 100k video frames.
translated by 谷歌翻译
Large language models have recently attracted significant attention due to their impressive performance on a variety of tasks. ChatGPT developed by OpenAI is one such implementation of a large, pre-trained language model that has gained immense popularity among early adopters, where certain users go to the extent of characterizing it as a disruptive technology in many domains. Understanding such early adopters' sentiments is important because it can provide insights into the potential success or failure of the technology, as well as its strengths and weaknesses. In this paper, we conduct a mixed-method study using 10,732 tweets from early ChatGPT users. We first use topic modelling to identify the main topics and then perform an in-depth qualitative sentiment analysis of each topic. Our results show that the majority of the early adopters have expressed overwhelmingly positive sentiments related to topics such as Disruptions to software development, Entertainment and exercising creativity. Only a limited percentage of users expressed concerns about issues such as the potential for misuse of ChatGPT, especially regarding topics such as Impact on educational aspects. We discuss these findings by providing specific examples for each topic and then detail implications related to addressing these concerns for both researchers and users.
translated by 谷歌翻译
Curiosity for machine agents has been a focus of lively research activity. The study of human and animal curiosity, particularly specific curiosity, has unearthed several properties that would offer important benefits for machine learners, but that have not yet been well-explored in machine intelligence. In this work, we conduct a comprehensive, multidisciplinary survey of the field of animal and machine curiosity. As a principal contribution of this work, we use this survey as a foundation to introduce and define what we consider to be five of the most important properties of specific curiosity: 1) directedness towards inostensible referents, 2) cessation when satisfied, 3) voluntary exposure, 4) transience, and 5) coherent long-term learning. As a second main contribution of this work, we show how these properties may be implemented together in a proof-of-concept reinforcement learning agent: we demonstrate how the properties manifest in the behaviour of this agent in a simple non-episodic grid-world environment that includes curiosity-inducing locations and induced targets of curiosity. As we would hope, our example of a computational specific curiosity agent exhibits short-term directed behaviour while updating long-term preferences to adaptively seek out curiosity-inducing situations. This work, therefore, presents a landmark synthesis and translation of specific curiosity to the domain of machine learning and reinforcement learning and provides a novel view into how specific curiosity operates and in the future might be integrated into the behaviour of goal-seeking, decision-making computational agents in complex environments.
translated by 谷歌翻译
This paper studies audio-visual suppression for egocentric videos -- where the speaker is not captured in the video. Instead, potential noise sources are visible on screen with the camera emulating the off-screen speaker's view of the outside world. This setting is different from prior work in audio-visual speech enhancement that relies on lip and facial visuals. In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression. We compare object recognition and action classification based visual feature extractors, and investigate methods to align audio and visual representations. Then, we examine different fusion strategies for the aligned features, and locations within the noise suppression model to incorporate visual information. Experiments demonstrate that visual features are most helpful when used to generate additive correction masks. Finally, in order to ensure that the visual features are discriminative with respect to different noise types, we introduce a multi-task learning framework that jointly optimizes audio-visual noise suppression and video based acoustic event detection. This proposed multi-task framework outperforms the audio only baseline on all metrics, including a 0.16 PESQ improvement. Extensive ablations reveal the improved performance of the proposed model with multiple active distractors, over all noise types and across different SNRs.
translated by 谷歌翻译